Two-tier Architecture for Domain Specific Document Summarization Using Probabilistic Latent Semantic Analysis
نویسندگان
چکیده
In this research work we have proposed two-tier architecture for document summarization. This architecture minimizes the redundancy and boosts the information relevancy in the summary by applying Probabilistic Latent Semantic Analysis (PLSA) at two levels. It also enhances the summarizer’s speed by using Incremental Expectation Maximization algorithm for PLSA learning rather than Expectation Maximization. It starts with collecting number of topical information from multiple news portals and applies PLSA for single document Summarization. At next level PLSA is applied again in order to produce final summary but this time for multiple-document summarization. Here the two-tier stands for single and multiple-document summarization. In this paper we have performed a summarization experiment and we have also given a brief report of our experimental results. We experimented with variety of documents for several times and observed that results are generated in a matter of few seconds and moreover, the quality of the solution is fairly improved. To validate our results we have used the ROUGE metrics. The validation results devise the effectiveness of the proposed architecture.
منابع مشابه
Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis
We consider the problem of query-focused multidocument summarization, where a summary containing the information most relevant to a user’s information need is produced from a set of topic-related documents. We propose a new method based on probabilistic latent semantic analysis, which allows us to represent sentences and queries as probability distributions over latent topics. Our approach comb...
متن کاملPersonalized Multi-Document Summarization using N-Gram Topic Model Fusion
We consider the problem of probabilistic topic modeling for query-focused multi-document summarization. Rather than modeling topics as distributions over a vocabulary of terms, we extend the probabilistic latent semantic analysis (PLSA) approach with a bigram language model. This allows us to relax the conditional independence assumption between words made by standard topic models. We present a...
متن کاملMulti-document Summarization using Probabilistic Topic-based Network Models
Multi-document summarization has obtained much attention in the research domain of text summarization. In the past, probabilistic topic models and network models have been leveraged to generate summaries. However, previous studies do not investigate different combinations of various topic models and network models. This paper describes an integrated approach considering both probabilistic topic...
متن کاملSpoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms
This paper proposes an improved approach for spoken lecture summarization, in which random walk is performed on a graph constructed with automatically extracted key terms and probabilistic latent semantic analysis (PLSA). Each sentence of the document is represented as a node of the graph and the edge between two nodes is weighted by the topical similarity between the two sentences. The basic i...
متن کاملIntegrating Clustering and Multi-Document Summarization by Bi-Mixture Probabilistic Latent Semantic Analysis (PLSA) with Sentence Bases
Probabilistic Latent Semantic Analysis (PLSA) has been popularly used in document analysis. However, as it is currently formulated, PLSA strictly requires the number of word latent classes to be equal to the number of document latent classes. In this paper, we propose Bi-mixture PLSA, a new formulation of PLSA that allows the number of latent word classes to be different from the number of late...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012